HDDS-14669. Implement a new async finalize command which does not block on the server#10152
HDDS-14669. Implement a new async finalize command which does not block on the server#10152sodonnel merged 14 commits intoapache:HDDS-14496-zdufrom
Conversation
…zation command and poll using new status command
errose28
left a comment
There was a problem hiding this comment.
Overall looks good, just left some minor comments. The UpgradeFinalizer parts of this change will be replaced as part of HDDS-15129 but this is a good starting point for the switch.
| if (status == FINALIZATION_REQUIRED) { | ||
| finalizationExecutor.execute(service, this); | ||
| } | ||
| } catch (NotLeaderException e) { |
There was a problem hiding this comment.
This should automatically be propagated back to the client without any extra handling required.
There was a problem hiding this comment.
I guess it won't propagate back with the catch block in place, so I think we should re-throw after the log?
There was a problem hiding this comment.
I don't think the catch or log is necessary. This is now a single ratis request so it works the same as others like close container, close pipeline, etc. If the contacted node is not the leader the finalization will not happen at all and the client's failover proxy should retry on the leader automatically.
errose28
left a comment
There was a problem hiding this comment.
The only functional comment I have left is on the NotLeaderException handling. Everything else LGTM.
| if (status == FINALIZATION_REQUIRED) { | ||
| finalizationExecutor.execute(service, this); | ||
| } | ||
| } catch (NotLeaderException e) { |
There was a problem hiding this comment.
I don't think the catch or log is necessary. This is now a single ratis request so it works the same as others like close container, close pipeline, etc. If the contacted node is not the leader the finalization will not happen at all and the client's failover proxy should retry on the leader automatically.
What changes were proposed in this pull request?
The original scm finalize command blocked until SCM and enough datanodes had finalized. The new design is that the finalize command should kick off the finalize process and then return immediately. Any other process which needs to see the progress must call the finalize status command to see if it has completed or not.
This change adds a new protobuf message which triggers finalize and returns. There is a new "ozone admin upgrade finalize" command which triggers finalize.
The existing tests are adjusted so they call the new command and then poll the status command to see if finalize has completed or not.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14669
How was this patch tested?
Existing integration tests modified to call the new flow.
Added a simple test to validate the new CLI command.